NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Surrogate {B}ayesian Networks for Approximating Evolutionary Games

Hsiao, Vincent; Nau, Dana; Pezeshki, Bobak; Dechter, Rina (May 2025, PMLR)
Dasgupta, Sanjoy; Mandt, Stephan; Li, Yingzhen (Ed.)
Spatial evolutionary games are used to model large systems of interacting agents. In earlier work, a method was developed using Bayesian Networks to approximate the population dynamics in these games. One advantage of that approach is that one can smoothly adjust the size of the network to get more accurate approximations. However, scaling the method up can be intractable if the number of strategies in the evolutionary game increases. In this paper, we propose a new method for computing more accurate approximations by using surrogate Bayesian Networks. Instead of doing inference on larger networks directly, we do it on a much smaller surrogate network extended with parameters that exploit the symmetry inherent to the domain. We learn the parameters on the surrogate network using KL-divergence as the loss function. We illustrate the value of this method empirically through a comparison on several evolutionary games.
more » « less
Free, publicly-accessible full text available May 2, 2026
Surrogate Bayesian Networks for Approximating Evolutionary Games

Hsiao, Vincent; Nau, Dana S; Dechter, Rina (May 2024, PMLR)
Dasgupta, Sanjoy; Mandt, Stephan; Li, Yingzhen (Ed.)
Spatial evolutionary games are used to model large systems of interacting agents. In earlier work, a method was developed using Bayesian Networks to approximate the population dynamics in these games. One advantage of that approach is that one can smoothly adjust the size of the network to get more accurate approximations. However, scaling the method up can be intractable if the number of strategies in the evolutionary game increases. In this paper, we propose a new method for computing more accurate approximations by using surrogate Bayesian Networks. Instead of doing inference on larger networks directly, we do it on a much smaller surrogate network extended with parameters that exploit the symmetry inherent to the domain. We learn the parameters on the surrogate network using KL-divergence as the loss function. We illustrate the value of this method empirically through a comparison on several evolutionary games.
more » « less
Full Text Available
Distributionally Robust Quickest Change Detection using Wasserstein Uncertainty Sets

Xie, Liyan; Liang, Yuchen; Veeravalli, Venugopal V (May 2024, PMLR)
Dasgupta, Sanjoy; Mandt, Stephan; Li, Yingzhen (Ed.)
The problem of quickest detection of a change in the distribution of streaming data is considered. It is assumed that the pre-change distribution is known, while the only information about the post-change is through a (small) set of labeled data. This post-change data is used in a data-driven minimax robust framework, where an uncertainty set for the post-change distribution is constructed. The robust change detection problem is studied in an asymptotic setting where the mean time to false alarm goes to infinity. It is shown that the least favorable distribution (LFD) is an exponentially tilted version of the pre-change density and can be obtained efficiently. A Cumulative Sum (CuSum) test based on the LFD, which is referred to as the distributionally robust (DR) CuSum test, is then shown to be asymptotically robust. The results are extended to the case with multiple post-change uncertainty sets and validated using synthetic and real data examples.
more » « less
Full Text Available
On cyclical MCMC

Liu, Xinru; Wang, Liwei; Smith, Aaron; Atchade, Yves (May 2024, AISTATS 2024)
Dasgupta, Sanjoy; Mandt, Stephan; Li, Yingzhen (Ed.)
Cyclical MCMC is a novel MCMC framework recently proposed by Zhang et al. (2019) to address the challenge posed by high-dimensional multimodal posterior distributions like those arising in deep learning. The algorithm works by generating a nonhomogeneous Markov chain that tracks -- cyclically in time -- tempered versions of the target distribution. We show in this work that cyclical MCMC converges to the desired limit in settings where the Markov kernels used are fast mixing, and sufficiently long cycles are employed. However in the far more common settings of slow mixing kernels, the algorithm may fail to converge to the correct limit. In particular, in a simple mixture example with unequal variance where powering is known to produce slow mixing kernels, we show by simulation that cyclical MCMC fails to converge to the desired limit. Finally, we show that cyclical MCMC typically estimates well the local shape of the target distribution around each mode, even when we do not have convergence to the target.
more » « less
Feasible Q-Learning for Average Reward Reinforcement Learning

Ying, Jin; Ramki, Gummadi; Zhengyuan, Zhou; Blanchet, Jose (May 2024, Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR)
Dasgupta, Sanjoy; Mandt, Stephan; Li, Yingzhen (Ed.)
Average reward reinforcement learning (RL) provides a suitable framework for capturing the objective (i.e. long-run average reward) for continuing tasks, where there is often no natural way to identify a discount factor. However, existing average reward RL algorithms with sample complexity guarantees are not feasible, as they take as input the (unknown) mixing time of the Markov decision process (MDP). In this paper, we make initial progress towards addressing this open problem. We design a feasible average-reward $$Q$$-learning framework that requires no knowledge of any problem parameter as input. Our framework is based on discounted $$Q$$-learning, while we dynamically adapt the discount factor (and hence the effective horizon) to progressively approximate the average reward. In the synchronous setting, we solve three tasks: (i) learn a policy that is $$\epsilon$$-close to optimal, (ii) estimate optimal average reward with $$\epsilon$$-accuracy, and (iii) estimate the bias function (similar to $$Q$$-function in discounted case) with $$\epsilon$$-accuracy. We show that with carefully designed adaptation schemes, (i) can be achieved with $$\tilde{O}(\frac{SA t_{\mathrm{mix}}^{8}}{\epsilon^{8}})$$ samples, (ii) with $$\tilde{O}(\frac{SA t_{\mathrm{mix}}^5}{\epsilon^5})$$ samples, and (iii) with $$\tilde{O}(\frac{SA B}{\epsilon^9})$$ samples, where $$t_\mathrm{mix}$$ is the mixing time, and $B > 0$ is an MDP-dependent constant. To our knowledge, we provide the first finite-sample guarantees that are polynomial in $$S, A, t_{\mathrm{mix}}, \epsilon$$ for a feasible variant of $$Q$$-learning. That said, the sample complexity bounds have tremendous room for improvement, which we leave for the community’s best minds. Preliminary simulations verify that our framework is effective without prior knowledge of parameters as input.
more » « less
Full Text Available
Soft-constrained Schrödinger Bridge: a Stochastic Control Approach

Garg, Jhanvi; Zhang, Xianyang; Zhou, Quan (May 2024, Proceedings of The 27th International Conference on Artificial Intelligence and Statistics)
Dasgupta, Sanjoy; Mandt, Stephan; Li, Yingzhen (Ed.)
Schrödinger bridge can be viewed as a continuous-time stochastic control problem where the goal is to find an optimally controlled diffusion process whose terminal distribution coincides with a pre-specified target distribution. We propose to generalize this problem by allowing the terminal distribution to differ from the target but penalizing the Kullback-Leibler divergence between the two distributions. We call this new control problem soft-constrained Schrödinger bridge (SSB). The main contribution of this work is a theoretical derivation of the solution to SSB, which shows that the terminal distribution of the optimally controlled process is a geometric mixture of the target and some other distribution. This result is further extended to a time series setting. One application is the development of robust generative diffusion models. We propose a score matching-based algorithm for sampling from geometric mixtures and showcase its use via a numerical example for the MNIST data set.
more » « less
Full Text Available
Feasible $$Q$$-Learning for Average Reward Reinforcement Learning

Ying, Jin; Ramki, Gummadi; Zhou, Zhengyuan; Blanchet, Jose (May 2024, Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR)
Dasgupta, Sanjoy; Mandt, Stephan; Li, Yingzhen (Ed.)
Average reward reinforcement learning (RL) provides a suitable framework for capturing the objective (i.e. long-run average reward) for continuing tasks, where there is often no natural way to identify a discount factor. However, existing average reward RL algorithms with sample complexity guarantees are not feasible, as they take as input the (unknown) mixing time of the Markov decision process (MDP). In this paper, we make initial progress towards addressing this open problem. We design a feasible average-reward $$Q$$-learning framework that requires no knowledge of any problem parameter as input. Our framework is based on discounted $$Q$$-learning, while we dynamically adapt the discount factor (and hence the effective horizon) to progressively approximate the average reward. In the synchronous setting, we solve three tasks: (i) learn a policy that is $$\epsilon$$-close to optimal, (ii) estimate optimal average reward with $$\epsilon$$-accuracy, and (iii) estimate the bias function (similar to $$Q$$-function in discounted case) with $$\epsilon$$-accuracy. We show that with carefully designed adaptation schemes, (i) can be achieved with $$\tilde{O}(\frac{SA t_{\mathrm{mix}}^{8}}{\epsilon^{8}})$$ samples, (ii) with $$\tilde{O}(\frac{SA t_{\mathrm{mix}}^5}{\epsilon^5})$$ samples, and (iii) with $$\tilde{O}(\frac{SA B}{\epsilon^9})$$ samples, where $$t_\mathrm{mix}$$ is the mixing time, and $B > 0$ is an MDP-dependent constant. To our knowledge, we provide the first finite-sample guarantees that are polynomial in $$S, A, t_{\mathrm{mix}}, \epsilon$$ for a feasible variant of $$Q$$-learning. That said, the sample complexity bounds have tremendous room for improvement, which we leave for the community’s best minds. Preliminary simulations verify that our framework is effective without prior knowledge of parameters as input.
more » « less
Full Text Available
Feasible Q-Learning for Average Reward Reinforcement Learning

Ying, Jin; Ramki, Gummadi; Zhou, Zhengyuan; Blanchet, Jose (May 2024, Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR)
Dasgupta, Sanjoy; Mandt, Stephan; Li, Yingzhen (Ed.)
Average reward reinforcement learning (RL) provides a suitable framework for capturing the objective (i.e. long-run average reward) for continuing tasks, where there is often no natural way to identify a discount factor. However, existing average reward RL algorithms with sample complexity guarantees are not feasible, as they take as input the (unknown) mixing time of the Markov decision process (MDP). In this paper, we make initial progress towards addressing this open problem. We design a feasible average-reward $$Q$$-learning framework that requires no knowledge of any problem parameter as input. Our framework is based on discounted $$Q$$-learning, while we dynamically adapt the discount factor (and hence the effective horizon) to progressively approximate the average reward. In the synchronous setting, we solve three tasks: (i) learn a policy that is $$\epsilon$$-close to optimal, (ii) estimate optimal average reward with $$\epsilon$$-accuracy, and (iii) estimate the bias function (similar to $$Q$$-function in discounted case) with $$\epsilon$$-accuracy. We show that with carefully designed adaptation schemes, (i) can be achieved with $$\tilde{O}(\frac{SA t_{\mathrm{mix}}^{8}}{\epsilon^{8}})$$ samples, (ii) with $$\tilde{O}(\frac{SA t_{\mathrm{mix}}^5}{\epsilon^5})$$ samples, and (iii) with $$\tilde{O}(\frac{SA B}{\epsilon^9})$$ samples, where $$t_\mathrm{mix}$$ is the mixing time, and $B > 0$ is an MDP-dependent constant. To our knowledge, we provide the first finite-sample guarantees that are polynomial in $$S, A, t_{\mathrm{mix}}, \epsilon$$ for a feasible variant of $$Q$$-learning. That said, the sample complexity bounds have tremendous room for improvement, which we leave for the community’s best minds. Preliminary simulations verify that our framework is effective without prior knowledge of parameters as input.
more » « less
Full Text Available
Learning Populations of Preferences via Pairwise Comparison Queries

Tatli, Gokcan; Chen, Yi; Vinayak, Ramya K (May 2024, Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:1720-1728, 2024.)
Dasgupta, Sanjoy; Mandt, Stephan; Li, Yingzhen (Ed.)
Full Text Available
Graph Pruning for Enumeration of Minimal Unsatisfiable Subsets

Lymperopoulos, Panagiotis; Liu, Liping (May 2024, Proceedings of The 27th International Conference on Artificial Intelligence and Statistics)
Dasgupta, Sanjoy; Mandt, Stephan; Li, Yingzhen (Ed.)
Full Text Available

« Prev Next »

Search for: All records